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We derive novel conditions that guarantee convergence of the Sum-Product algorithm (also known as Loopy Belief Propagation 
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Sufficient conditions for convergence of the 
Sum-Product Algorithm 



I. Introduction 

THE Sum-Product Algorithm [2], also known as Loopy 
Belief Propagation, which we will henceforth abbreviate 
as LBP, is a popular algorithm for approximate inference on 
graphical models. Applications can be found in diverse areas 
such as error correcting codes (iterative channel decoding 
algorithms for Turbo Codes and Low Density Parity Check 
Codes [3]), combinatorial optimization (satisfiability problems 
such as 3-SAT and graph coloring [4]) and computer vision 
(stereo matching [5] and image restoration [6]). LBP can be 
regarded as the most elementary one in a family of related 
algorithms, consisting of double-loop algorithms [7], GBP [8], 
EP [9], EC [10], the Max-Product Algorithm [11], the Survey 
Propagation Algorithm [4], [12] and Fractional BP [13]. A 
good understanding of LBP may therefore be beneficial to 
understanding these other algorithms as well. 

In practice, there are two major obstacles in the application 
of LBP to concrete problems: (i) if LBP converges, it is not 
clear whether the results are a good approximation of the 
exact marginals; (ii) LBP does not always converge, and in 
these cases gives no approximations at all. These two issues 
might actually be interrelated: the "folklore" is that failure 
of LBP to converge often indicates low quality of the Bethe 
approximation on which it is based. This would mean that if 
one has to "force" LBP to converge (e.g. by using damping 
or double-loop approaches), one may expect the results to be 
of low quality. 

Although LBP is an old algorithm that has been reinvented 
in many fields, a thorough theoretical understanding of the 
two aforementioned issues and their relation is still lacking. 
Significant progress has been made in recent years regarding 
the question under what conditions LBP converges (e.g. [14], 
[15], [16]iH on the uniqueness of fixed points [18], and 
on the accuracy of the marginals [15], but the theoretical 
understanding is still incomplete. For the special case of a 
graphical model consisting of a single loop, it has been shown 
that convergence rate and accuracy are indeed related [19]. 

In this work, we study the question of convergence of LBP 
and derive new sufficient conditions for LBP to converge to a 
unique fixed point. Our results are more general and in some 
cases stronger than previously known sufficient conditions. 

II. Background 

To introduce our notation, we give a short treatment of 
factorizing probability distributions, the corresponding visual- 
izations called factor graphs, and the LBP algorithm on factor 

'After submission of this work, we came to the attention of [17], which 
contains improved versions of results in [16], some of which are similar or 
identical to results presented here (c.f. Section IV-Bt . 



graphs. For an excellent, extensive treatment of these topics 
we refer the reader to [2]. 

A. Graphical model 

Consider N discrete random variables X{ E Xi, for i £ 
JV := {1,2, ... ,N}; we write x — (xi, . . . ,xjy) € X := 
Yli£j\f We are interested in the class of probability mea- 
sures on X that can be written as a product of factors (also 
called potentials): 

P( Ilr .,^):=ip(4 (1) 
/ex 

The factors ip 1 are indexed by subsets of JV, i.e. X C V(JV). 
If I 6 I is the subset / = ...,i m } C JV, we write 
xi := (xi 1 , . . . , Xi m ) E Yliei Each factor ip 1 is a positive 
function^ : Uiei X i (°> 

oo). Z is a normalizing constant 
ensuring that Y^xeX P( x ) = 1- The class of probability 
measures described by (Q]l contains Markov Random Fields 
as well as Bayesian Networks. We will use uppercase letters 
for indices of factors (I, J, K, . . . 6 I) and lowercase letters 
for indices of variables k, . . . € JV). 

The factor graph that corresponds to the probability distri- 
bution ([T} is a bipartite graph with vertex set JV U I. In the 
factor graph (see also Fig. [TJ, each variable node i G JV is 
connected with all the factors I El that contain the variable, 
i.e. the neighbors of i are the factor nodes JV, := {1 e I : j e 
/}. Similarly, each factor node I € X is connected with all 
the variable nodes i E JV that it contains and we will simply 
denote the neighbors of / by / = {i E JV : i E I}. For each 
variable node i E JV, we define the set of its neighboring 
variable nodes by di := ({J Ni) \ {i}, i.e. di is the set of 
indices of those variables that interact directly with X{. 

B. Loopy Belief Propagation 

Loopy Belief Propagation is an algorithm that calculates 
approximations to the marginals {P(x/)}/gi and {P(xi)}i£j\f 
of the probability measure (|T). The calculation is done by 
message-passing on the factor graph: each node passes mes- 
sages to its neighbors. One usually discriminates between two 
types of messages: messages p I ~ tl (xi) from factors to vari- 
ables and messages p 1 ^ 1 (xi) from variables to factors (where 
i E I E I). Both messages are positive functions on Xi, or, 
equivalently, vectors in M j (with positive components). The 
messages that are sent by a node depend on the incoming 
messages; the new messages, designated by jl, are given in 

2 In subsection IIV-EI we will loosen this assumption and allow for factors 
containing zeros. 
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Fig. 1. Part of the factor graph illustrating the LBP update rules (2) and 
The factor nodes /, J, / £ I are drawn as rectangles, the variable nodes 
h 3, j'J" £ A/" as circles. Note that Nj\I = {J, J'} and I\i = {j, 
Apart from the messages that have been drawn, each edge also carries a 
message flowing in the opposite direction. 



terms of the incoming messages by the following LBP update 
rule^ 

JeNj\i 

^\xi) cx J2 tffa) II (3) 
xi\i jei\i 

Usually, one normalizes the messages in the i?i-sense (i.e. 
such that J2 X ex m( x = !)■ If a ^ messages have converged 
to some fixed point (i^, one calculates the approximate 
marginals or beliefs 



h(xi) = C l \{ ^{xi) « P( Xi ) 



where the C"s and C 7 's are normalization constants, chosen 
such that the approximate marginals are normalized in l\- 
sense. A fixed point always exists if all factors are strictly 
positive [8]. However, the existence of a fixed point does not 
necessarily imply convergence towards the fixed point, and 
fixed points may be unstable. 

Note that the beliefs are invariant under rescaling of the 
messages 

for positive constants a, which shows that the precise way 
of normalization in (f2]i and (f3]l is irrelevant. For numerical 
stability however, some way of normalization (not necessarily 
in £i-sense) is desired to ensure that the messages stay in some 
compact domain. 

In the following, we will formulate everything in terms of 
the messages ^^(xi) from factors to variables; the update 
equations are then obtained by substituting (fZ|i in OJ: 



F^in) = c 



I->i 



^(xi) n n ^Hxj)- (4) 

jel\i J£N 3 \I 



with C such that J2x ex A i x i) ~ !• We consider 
here LBP with a parallel update scheme, which means that 
all message updates (0]l are done in parallel. 



X. 



We abuse notation slightly by writing X \ x instead of X \ {x} for sets 



III. Special case: binary variables with pairwise 

INTERACTIONS 

In this section we investigate the simple special case of 
binary variables (i.e. = 2 for all i S Af), and in addition 
we assume that all potentials consist of at most two variables 
("pairwise interactions")- Although this is a special case of the 
more general theory to be presented later on, we start with this 
simple case because it illustrates most of the underlying ideas 
without getting involved with the additional technicalities of 
the general case. 

We will assume that all variables are ±l-valued, i.e. Xi = 
{— 1, +1} for all i € JV. We take the factor index set as 
X = Ii U I2 with T\ = TV (the "local evidence") and 
2^2 C : i,j 6 jV, i 7^ j} (the "pair-potentials"). The 

probability measure (fl} can then be written as 



P(x) = — exp J v x 
\{ t ,j}ex 2 



(5) 



for some choice of the parameters Jy ("couplings") and Oi 
("local fields"), with ip l (xi) = exp(8iXi) for i € T\ and 
ipb'fi(xi,Xj) = exp(JijXiXj) for {i,j} e 1 2 . 

Note from (|4|i that the messages sent from single-variable 
factors X\ to variables are constant. Thus the question whether 
messages converge can be decided by studying only the 
messages sent from pair-potentials I 2 to variables. We will 
thus ignore messages that are sent from single-variable factors. 
It turns out to be advantageous to use the following "natural" 
parameterization of the messages 



tanh v % ^ 3 



p 



{i,j'}- > j 



(Xj = 1) - H 



{hj}—>-j 



(Xj 



(6) 



where v 1 ^^ e R is now interpreted as a message sent from 
variable i to variable j (instead of a message sent from the 
factor to variable j). Note that in the pairwise case, 

the product over j <= J \ i in (0J becomes trivial. Some 
additional elementary algebraic manipulations show that the 
LBP update equations become particularly simple in this 
parameterization and can be written as: 



tanhi/ 1 ^ = tanh( Jy) tanh f?,; 



tedi\j 



(7) 



where di = {t E JV : {i,t} € I2] are the variables that 
interact with i via a pair-potential. 

Defining the set of ordered pairs D := {i — > j : {i,j} £ 
I2}, we see that the parallel LBP update is a mapping 
/ : R D — > MP; Q specifies the component (f{v)) l ^ J := 
pi— >j m terms of the components of v. Our goal is now to 
derive sufficient conditions under which the mapping / is a 
contraction. For this we need some elementary but powerful 
mathematical theorems. 



A. Nonned spaces, contractions and bounds 

In this subsection we introduce some (standard) notation 
and remember the reader of some elementary but important 
properties of vector norms, matrix norms, contractions and 
the Mean Value Theorem in arbitrary normed vector spaces, 
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which are the main mathematical ingredients for our basic tool, 
Lemma [2] The reader familiar with these topics can skip this 
subsection and proceed directly to Lemma [2] in section IIII-BI 
Let (V, ||-||) be a normed finite-dimensional real vector 
space. Examples of norms that will be important later on are 
the ^i-norm on M. N , defined by 

N 



Lemma 1: Let (V, ||-|| ) be a normed space and / : V — > V 
a differentiable mapping. Then, for x, y S V: 

wm-mw <\\ y -x\\- sup n/'(z)ii 

ze[x,y] 

where we wrote [x, y] for the segment {Ax + (1 — X)y : A E 
[0, 1]} joining x and y. 

Proof: See [20, Thm. 8.5.4]. □ 



and the ^-norm on R , defined by 

IMIoc : = . ™ a *N- 

i£{l,...,iv} 

A norm on a vector space V induces a metric on V by the 
definition d(v, w) := \\v — w\\ . The resulting metric space is 
complete^ 

Let (X, d) be a metric space. A mapping / : X — > X is 
called a contraction with respect to d if there exists < K < 1 
such that 



d(f(x),f(y)) < Kd(x, y) for all x,yeX. 



(8) 



In case d is induced by a norm || • || , we will call a contraction 
with respect to d a ||-|| -contraction. If (X, d) is complete, we 
can apply the following celebrated theorem, due to Banach: 

Theorem 1 (Contracting Mapping Principle): Let 
/ : X — > X be a contraction of a complete metric 
space (X, d). Then / has a unique fixed point Xoo E X and 
for any x E X, the sequence x, f(x), f 2 (x), . . . obtained by 
iterating / converges to Xqq. The rate of convergence is at 
least linear, since d(f(x), £oo) < Kd(x, Xoo) for all x E X . 

Proof: Can be found in many textbooks on analysis. □ 
Note that linear convergence means that the error decreases 
exponentially, indeed d(x n ,Xoo) < CK n for some C. 

Let (V, || ■ || ) be a normed space. The norm induces a matrix 
norm (also called operator norm) on linear mappings A : V — > 
V, defined as follows: 

:= sup \\Av\\ . 
vev, 

M\<1 

The ^-norm on R w induces the following matrix norm: 

N 

Mi = . ™" El^'l ( 9 ) 

]£{!,... ,N} 



where Ay :— (Aej)i with ej the j th canonical basis vector. 



The ^oo-norm on induces the following matrix norm: 

„ = max. . > J.i iy |. (10) 



N 

max > | Aj 



In the following consequence of the well-known Mean 
Value Theorem, the matrix norm of the derivative ("Jacobian") 
f'(v) at v E V of a differentiable mapping f : V —> V is used 
to bound the distance of the /-images of two vectors: 



B. The basic tool 

Combining Theorem Q] and Lemma Q] immediately yields 
our basic tool: 

Lemma 2: Let (V, ||-|| ) be a normed space, / : V — * V 
differentiable and suppose that 

sup ||/»|| < 1. 

Then / is a 1 1 ■ 1 1 -contraction by Lemma Q] Hence, for any 
v E V, the sequence u, f(v), f 2 (v), . . . converges to a unique 
fixed point ti m e F with a convergence rate that is at least 
linear by Theorem Q] □ 



C. Sufficient conditions for LBP to be a contraction 

We apply Lemma [2] to the case at hand: the parallel LBP 

written out in components 



update mapping / 
in Q. Different choices of the vector norm on M. D will 
yield different sufficient conditions for whether iterating / will 
converge to a unique fixed point. We will study two examples: 
the £i norm and the £oo norm. 

The derivative of / is easily calculated from (0 and is given 

by 



i—i-j,k^l 



= Ai^j t k^iBi^j(v) 



wher^l 



5w» := 



1 - tanh^ 



E 



tedi\j 



1 - tanh 2 (i>^J'(^)) 



■ sgn J Vj 



(11) 



(12) 
(13) 



Note that we have absorbed all ^-dependence in the factor 
Bi^j[y); the reason for this will become apparent later on. 
The factor 4->j,fe^/ is nonnegative and independent of v 
and captures the structure of the graphical model. Note that 
sup ye y \Bi^j{v)\ = 1, implying that 



< A 



i—>j,k—>l 



(14) 



everywhere on V. 



4 Completeness is a topological property which we will not further discuss, 
but we need this to apply Theorem fj] 



5 For a set X, we define the indicator function lx of X by lx(^) = 1 
if x g X and l x (x) = if x X. 
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1) Example: the loo-norm: The i^-norm on R D yields the 
following condition: 

Corollary 1: For binary variables with pairwise interac- 
tions: if 



max (\di\ — 1) maxtanh | Jy| < 1, (15) 

LBP is an ^-contraction and converges to a unique fixed 
point, irrespective of the initial messages. 
Proof: Using (H, (O and d): 



7 > 7 ' ' 



k^l 



Qyk^l 

< maxV tanh | Jj, | 5al di \Ak) 



fc^Z 



max max > tanh|J,., | 

fce9i\j 

max ( (\di\ — 1) max tanh |Jjj | ( . 



□ 



and now simply apply Lemma [2] 

2) Another example: the i\-norm: Using the fi-norm in- 
stead, we find: 

Corollary 2: For binary variables with pairwise interac- 
tions: 

maxmax > tanhlJi,! < 1, (16) 

ieW kedi 

j£di\k 

LBP is an £% -contraction and converges to a unique fixed point, 
irrespective of the initial messages. 

Proof: Similar to the proof of Corollary Q] now using 
© instead of (IToT >: 

ll/Mlli < max tanh | Jjj \ S a l di \j (k) 

i—>j 

= maxmax > taiih I Ji, I . 
ieM kedi ^— ' 

□ 

It is easy to see that condition ( TToT ) is implied by ( fTBI ), but 
not conversely; thus in this case the ^i-norm yields a tighter 
bound than the ^oc-norm. 

D. Beyond norms: the spectral radius 

Instead of pursuing a search for the optimal norm, we 
will derive a criterion for convergence based on the spectral 
radius of the matrix IIOl l, The key idea is to look at several 
iterations of LBP at once. This will yield a significantly 
stronger condition for convergence of LBP to a unique fixed 
point. 

For a square matrix A, we denote by a{A) its spectrum, i.e. 
the set of eigenvalues of A. By p(A) we denote its spectral 
radius, which is defined as p(A) := sup |cr(A)|, i.e. the largest 
modulus of eigenvalues of A^ 

Lemma 3: Let / : X — > X be a mapping, d a metric on 
X and suppose that f N is a d-contraction for some N £ N. 

6 One should not confuse the spectral radius p(A) with the spectral norm 
\\A\\n = \J p(A T A) of A, the matrix norm induced by the ii -norm. 



Then / has a unique fixed point and for any x £ X, the 
sequence x, f(x), f 2 (x), . . . obtained by iterating / converges 

tO Xoo. 

Proof: Take any x £ X. Consider the N sequences 
obtained by iterating / , starting respectively in x, f(x), . . . , 
f N -\x): 

x,f N (x),f N (x),... 
f(x),f N +\x),f N +\x),... 



f N -Hx)J 2N -\x),f N -\x),... 

Each sequence converges to Xoo since f N is a d-contraction 
with fixed point x^. But then the sequence x, f(x), f 2 (x), . . . 
must converge to Xrx,. □ 
Theorem 2: Let / : M m -> E m be differentiable and 
suppose that f'{x) — B(x)A, where A has nonnegative 
entries and B is diagonal with bounded entries < 
1. If p(A) < 1 then for any x £ R m , the sequence 
x i f( x )i f 2 ( x )> ■ ■ ■ obtained by iterating / converges to a fixed 
point Xoo, which does not depend on x. 

Proof: For a matrix B, we will denote by \B\ the matrix 
with entries \B\ t - = \Bij\. For two matrices B,C we will 
write B < C if B^ < Cij for all entries Note that 

if \B\ < \C\, then {{B^ < HC^. Also note that \BC\ < 
\B\ \C\. Finally, if < A and B < C, then AB < AC and 
BA < CA. 

Using these observations and the chain rule, we have for 
any n = 1, 2, . . . and any x £ M. m : 



llf'ir-Hx)) 



n 

<Y[(\B(P^(x))\A) <a\ 



i=l 



hence \\{r)'{x)\\ x < P"^. 

By the Gelfand spectral radius theorem, 



lim \\A ri 



l/n _ 



p(A). 



Ii < 
< 1. 



Choose e > such that p{A) + e < 1. For some N, \\A N \ 
(p(A) + e)^ < 1. Hence for all x £ R m , || (f N )'(x) \\ 1 
Applying Lemma [2] we conclude that f N is a ^i-contraction. 
Now apply Lemma [3] □ 
Using (flTT i. (fl2l and [\3[ , this immediately yields: 
Corollary 3: For binary variables with pairwise interac- 
tions, LBP converges to a unique fixed point, irrespective of 
the initial messages, if the spectral radius of the \D\ x \D\- 
matrix 



tanh | Jij | Sijlgisj (k) 



is strictly smaller than 1. □ 
The calculation of the spectral norm of the (sparse) matrix 
A can be done using standard numerical techniques in linear 
algebra. 

Any matrix norm of A is actually an upper bound on 
the spectral radius p(A), since for any eigenvalue A of A 
with eigenvector x we have |A| ||x|| = ||Aa;| = < 
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\\A\\ \\x\\ , hence p{A) < \\A\\ . This implies that no norm in 
Lemma [2] will result in a sharper condition than Corollary [3] 
hence the title of this section. 

Further, for a given matrix A and some e > 0, there exists 
a vector norm ||-|| such that the induced matrix norm of A 
satisfies p(A) < \\A\\ < p{A) +e; see [21] for a constructive 
proof. Thus for given A one can approximate p(A) arbitrarily 
close by induced matrix norms. This immediately gives a result 
on the convergence rate of LBP (in case p(A) < 1): for 
any e > 0, there exists a norm-induced metric such that the 
linear rate of contraction of LBP with respect to that metric 
is bounded from above by p(A) + e. 

One might think that there is a shorter proof of Corollary 
[3j it seems quite plausible intuitively that in general, for a 
continuously differentiable / : R ,n — > W n , iterating / will 
converge to a unique fixed point if sup.j. gRm p(f'(x)) < 1. 
However, this conjecture (which has been open for a long 
time) has been shown to be true in two dimensions but false 
in higher dimensions [22]. 

E. Improved bound for strong local evidence 

Empirically, it is known that the presence of strong local 
fields (i.e. single variable factors which are far from uniform) 
often improves the convergence of LBP. However, our results 
so far are completely independent of the parameters {#i}ieAf 
that measure the strength of the local evidence. By proceeding 
more carefully than we have done above, the results can easily 
be improved in such a way that local evidence is taken into 
account. 

Consider the quantity Bi^j defined in ( fT2l . We have 
bounded this quantity by noting that sup„ el/ \Bi^j(v)\ = 1. 
Note that for all LBP updates (except for the first one), the 
argument v (the incoming messages) is in f(V), which can be 
considerably smaller than the complete vector space V. Thus, 
after the first LBP update, we can use 



sup \Bi 



sup 



1 — tanh" ( 



E 



k£di\j ■ 



tanh 2 (v 1 ^ (v)) 
tanh 2 (/i 4 \i) 



u£f(y) 1 — tanh (Jy)tanh (ft/V?) 
where we used (0 and defined the cavity field 



(17) 



The function x \ 



1 — tanh x 



i -tanh"(j v )tanh^ is strictl y decreasing for 
> and symmetric around x = 0, thus, defining 



hp := inf 



(18) 



we obtain 

sup \Bi-+j(v)\ = 



l-tanli 2 (/ii Vj ) 



vef(y) 1 - tanh 2 (J y ) tanh 2 (fi} 3 ) 

Now, from (0) we derive that 

{v k ^ i :v&f{V)} = {-\J ki \AJ ki \), 



hence 

{ti^(y) :uef(V)} = (hP,hp) 
where we defined 



■i\3 



kedi\j 



We conclude that h* is simply the distance between and 



the interval (hP , h+ J ), i.e. 



^ = hP 



if lp < 
if h 1 } 3 > 
otherwise. 



Thus the element Ai-tj k-*i (for i G 9j, k <E di\ j) of the 
matrix A defined in Corollary [3] can be replaced by 



tanh I J. 



1 — tanh 2 (/4 



lJ 1 - tanh 2 ( Jij ) tanh 2 (hP ) 



tanh (| Jj. 



taah.(|Jy| + hP) 



which is generally smaller than tanh | Jy | and thus gives a 
tighter bound. 

This trick can be repeated arbitrarily often: assume that 
to > LBP updates have been done already, which means 
that it suffices to take the supremum of \Bi^j(v)\ over v E 
f m (V). Define for all i -> j € D and all * = 0, 1, . . . , m: 

h£ j :=w£{kP(v) : v e /*(V)}, 



and define the intervals 



* -\hpx} j 



(19) 
(20) 

(21) 



Specifically, for t = we have h Q = — oo and h ' = x . 
which means that 

(22) 



Hp = 



Using ^} and ( fTTI ). we obtain the following recursive relations 
for the intervals (where we use interval arithmetic defined in 
the obvious way): 



Ti-t+i = @i + a ^ an ^ 1 ftanh JfejtanhH^' 

k£di\j 

AM 



(23) 



Using this recursion relation, one can calculate H,n and define 
0: 



hP as the distance (in absolute value) of the interval Tip to 



if h i y < 



K. 

hp if hp >o 

otherwise. 
Thus by replacing the matrix A in Corollary [3] by 

Ai^j : k^i 



(24) 



tanh(| J 4i | - hp) + tanh(| J i5 \ + hl XJ ) 



8i,ildi\j{k), 



(25) 
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we obtain stronger results that improve as m increases: 

Corollary 4: Let m > 0. For binary variables with pair- 
wise interactions, LBP converges to a unique fixed point, 
irrespective of the initial messages, if the spectral radius of 
the 1 1? | x | D | -matrix defined in d25l > (with defined in 
equations (l2"Tli-(l2"4li) is strictly smaller than 1. □ 

IV. General case 

In the general case, when the domains Xi are arbitrarily 
large (but finite), we do not know of a natural parameterization 
of the messages that automatically takes care of the invariance 
of the messages /j 7 ^ j under scaling (like (|6]l does in the 
binary case). Instead of handling the scale invariance by the 
parameterization and using standard norms and metrics, it 
seems easier to take a simple parameterization and to change 
the norms and metrics in such a way that they are insensitive 
to the (irrelevant) extra degrees of freedom arising from the 
scale invariance. This is actually the key insight in extending 
the previous results beyond the binary case: once one sees how 
to do this, the rest follows in a (more or less) straightforward 
way. 

Another important point is to reparameterize the messages: 
a natural parameterization for our analysis is now in terms of 
logarithms of messages A 7 ^ 4 := log^ 7 ^\ The LBP update 
equations can be written in terms of the log-messages as: 

X^ixi) = log£ V 7 (x/)/i Al (*/v) ( 26 ) 
where we dropped the normalization and defined 



(27) 



Each log-message A 7 ^* is a vector in the vector space 
V 1 ^ 1 := M. Xi ; we will use greek letters as indices for the 
components, e.g. X 1 ^ 1 := X I ^ l (a) with a G Afj. We will call 
everything that concerns individual vector spaces V 7 ~* J local 
and define the global vector space V as the direct sum of the 
local vector spaces: 

V : = V 7 ^ 

The parallel LBP update is the mapping / : V — ► V, written 
out in components in d26i i and ( |27T i. 

Note that the invariance of the messages under scal- 

ing amounts to invariance of the log-messages A 7 ^' under 
translation. More formally, defining linear subspaces 

W 1 ^ 1 := {X e V 1 ^ 1 : X a = X a > for all a, a' G X t } (28) 

and their direct sum 

W := W 1 ^ 1 C V, 

the invariance amounts to the observation that 

/(A +w)- /(A) G W for all A G V, w G W. 

Since X+w and A are equivalent for our purposes, we want our 
measures of distance in V to reflect this equivalence. Therefore 
we will "divide out" the equivalence relation and work in the 
quotient space V / W, which is the topic of the next subsection. 



A. Quotient spaces 

Let V be a finite-dimensional vector space. Let W be a 
linear subspace of V. We can consider the quotient space 
V/W := {v+W : v G V}, where v+W := {v+w : w G W}. 
Defining addition and scalar multiplication on the quotient 
space in the natural way, the quotient space is again a vector 
space We will denote its elements as v :— v + W. Note that 
the projection n : V ^ V/W : v i— >v is linear. 

Let ||-|| be any vector norm on V. It induces a quotient 
norm on V/W, defined by 



inf 1 1 v ■ 



w 



(29) 



which is indeed a norm, as one easily checks. The quo- 
tient norm in turn induces the quotient metric d(v^,v^) := 
\\v2 — on V/W. The metric space (V/W,d) is complete 
(since any finite-dimensional normed vector space is com- 
plete). 

Let / : V — * V be a (possibly non-linear) mapping with 
the following symmetry: 

f(v + w)- /(«) G W for all v G V, w G W. (30) 

We can then unambiguously define the quotient mapping 



/ : V/W -> V/W 
which yields the following commutative diagram: 

V — ^— > V 

Itt tto/^/ott 



V/W 



f 



V/W 



For a linear mapping A : V — > V, condition ( f30b amounts 
to AW C W, i.e. A should leave W invariant; we can then 
unambiguously define the quotient mapping A : V/W — > 
V/W :v^^/. 

If / : V — > V is differentiable and satisfies d30l l. the 
symmetry property (f30b implies that /'(a;)W C hence we 
can define f'(x) : V/W — > V/W. The operation of taking 
derivatives is compatible with projecting onto the quotient 
space. Indeed, by using the chain rule and the identity no f = 
f o 7r, one finds that the derivative of the induced mapping 
/ : VyVK — > y/VF at x equals the induced derivative of / at 
x: 

for all x G V. 



/ (3?) = /'(x) 



(31) 



By Lemma |2] / is a contraction with respect to the quotient 
norm if 



sup 

xev/w 



/(3?) 



< 1. 



Using d29l and (IBTt , this condition can be written more 
explicitly as: 

sup sup inf \\ f (x) ■ v + w\\ < 1. 

i£y tiev, t«ew 

7 Indeed, we have a null vector + W, addition ((v\ + W) + (t>2 + W 7 ) := 
(ui + Vq) + W for vi,V2 £ V) and scalar multiplication (A(ti + VK) := 
(Xv) + for A e K, v e V). 
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B. Constructing a norm on V 

Whereas in the binary case, each message j/ 8- ^ was param- 
eterized by a single real number, the messages are now 
dimensional vectors X 1 ^ 1 (with components A 7 ^ 4 indexed by 
a E Xi). In extending the ^i-norm that provided useful in the 
binary case to the more general case, we have the freedom 
to choose the "local" part of the generalized £i-norm. Here 
we show how to construct such a generalization of the l\- 
norm and its properties; for a more detailed account of the 
construction, see Appendix lAl 

The "global" vector space V is the direct sum of the "local" 
subspaces V 7 ^\ Suppose that for each subspace V 7 ^\ we 



have a local norm 



A natural generalization of the 



£i-norm in the binary case is the following global norm on V: 

Nl : =E ll^lUi" 02) 

It is easy to check that this is indeed a norm on V. 

Each subspace V 7_>i has a 1 -dimensional subspace W 1 ^ 1 
defined in (l28l l and the local norm on V 1 ^ 1 induces a local 
quotient norm on the quotient space V 1 ^ 1 /W 7 ^ . The global 
norm (f32l > on V induces a global quotient norm on V/W, 
which is simply the sum of the local quotient norms (c.f. 
dA37J): 

IUI 



A 7 -' 



(33) 



Let A G V. The derivative /'(A) of / : V -> V at A is a 
linear mapping /'(A) : V — > V sati sfying /'(A)W C W. It 
projects down to a linear mapping /'(A) : V/W — ► V/W. 
The matrix norm of /'(A) induced by the quotient norm (l33l 
is given by (c.f. dA.581 0: 



/'(A) 



max > 



\(f'W)j. 



(34) 



where the local quotient matrix norm of the "block" 

(/'( A )) i^j^j is s iven b y ( c - f - , E29)) : 



(/'(A)W^ 
= sup 

INj^<i 



(/'(%. 



-3, J— \? 



(35) 



The derivative of the (unnormalized) parallel LBP update 
is easily calculated: 

dX^jx,) 

(36) 



To lighten the notation, we will use greek subscripts instead 
of arguments: let a correspond to Xi, j3 to Xj, (3' to yj and 7 
to xi\{ijy; for example, we write h^fai^) as h 1 ^. Taking 
the global quotient norm d34l of (l36l i yields: 



/'(A) =max^l NA/ (J)l /v (j) J B / _ 



(h^(X)) 
(37) 



where 



I->i 



^•(^(A)) := 



E 7 ^ip^XW 



E,8 E 7 ^afi^fi-i 

(A) 



(38) 



Note that Bi^i j^j depends on A via the dependence of h 1 ^ 1 
on A (c.f. d27ii). We will for the moment simplify matters by 
assuming that A can be any vector in V, and later discuss the 
more careful estimate (where A G / m (V)): 

snpB^j^^iX)) < sup B 7 ^,^# A '). (39) 

\ev feA«>o 

Defining the matrix A by the expression on the r.h.s. and using 
and ( |29l , we obtain: 

sup Bi^j^ih 1 *) = 



Ar->i,J->j 



sup sup inf 

h J \i>0 v£V J ^ j ffiEW '" 
\\v\\j^<l 



E/3' E 7 Tpip'-yh^vp, 



J2p E 7 V'q/3 7 ^/3 7 



(40) 



for I —> i and J — ► j such that j E I \ i and J E Nj \ I. 
Surprisingly, it turns out that we can calculate ( f40b analytically 
if we take all local norms to be norms. We have also 
tried the l 2 norm and the l\ norm as local norms, but were 
unable to calculate expression d40b analytically in these cases. 
Numerical calculations turned out to be difficult because of 
the nested suprema. 



C. Local norms 

Take for all local norms 



p.V, 



\i->i 



the 



norm on V 1 ^ 1 — 



The local subspace W 7 ^ 1 is spanned by the vector 1 := 
The local quotient norm of a vector v E 



1 e 



B.V, 



(1,1,- 

V 7 ^ 1 is thus given by 



= inf 



Wl\ 



- sup 

* a.a'eXi 



(41) 



For a linear mapping A : V' 7 ^-' — > V 1 ^ 1 that satisfies 
AW'^i C W 7 ^\ the induced quotient matrix norm 
is given by 



sup 

IML<i 



1 

= sup - sup 

IML<i 



(42) 



4 a,a'EXi — 



Fixing for the moment I —> i and J — > j (such that j G 
/ \ i and J € Nj\ I) and dropping the superscripts from the 
notation, using d42l . we can write d40b as 



sup - sup V 

h>0 * a,a'£Xi „ 



E 7 ^cc^hpi 



E 7 Vv/3 7 Vr 



E/3 E 7 ^afrhfll E/3 E 7 ^a'p^M 
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Xi — a tp 1 x i = P 




)QO Xl ^ {iJ} 

Fig. 2. Part of the factor graph relevant in expressions (45) , (46) and (47). 
Here i,j 6 Z with i + j, and J € Wj \ /. 



Interchanging the two suprema, fixing (for the moment) a 
and a', defining ^ := V> a/ 3 7 /VV/3 7 and fyg 7 := Vr^x' 
noting that we can without loss of generality assume that h 
is normalized in £% sense, the previous expression (apart from 
the 7> su P a .a') simplifies to 



sup 

h>0, 
\h\\ =1 



2. 





E^ 



01 



E/3 E 7 ^/3 7 ^ 



(43) 



In Appendix IB1 we show that this equals 

2 sup sup tanh ( \ log -f^- J . (44) 

0+0' ia' \ 4 ^ 7 / / 

We conclude that if we take all local norms to be the 
norms, then Ax-n t j^j equals 

:= sup sup sup tanh - log — j — ■ j — — , 

a#a' 0+0' 7,7' \ Va'0-/ Va0> 7 > J 

which is defined for i,j 6 / with i ^ j and where ij) a n„ is 
shorthand for ip 1 (x, = a,Xj = /?, xnuj\ = 7); see Fig. |2] 
for an illustration. 

Now combining (l37l i. (f39b and (|43T >, we finally obtain: 



/'(A) 



/'(A) 



< max 



E 



E 



ieNj\j iel\j 



Applying Lemma [2] now yields that / is a contraction with 
respect to the quotient norm on V /W if the right-hand side is 
strictly smaller than 1. 

Consider the mapping 77 : V/W — > V that maps A to the 
normalized A € V, i.e. such that ||exp X 1 ^ 1 1| = 1 for all 
components / — > i. If we take for / the l\ -normalized LBP 
update (in the log-domain), the following diagram commutes: 



V 



V/W 



V 



f = T] O / O 7T. 



/ 



V/W 



Since both it and 77 are continuous, we can translate conver- 
gence results for / back to similar results for /. We have 
proved: 

Theorem 3: If 



max > 

j-kj ^ 

ieNj\J iei\j 



N(^,i,j)<l, 



(46) 



k) — [mt^-© — wy^-Q) 



Fig. 3. Part of the factor graph in the pairwise case relevant in (48) and 
(49) . Here k £ di and j £ di\ k. 



LBP converges to a unique fixed point irrespective of the initial 
messages. □ 
Now we can also generalize Corollary [3] 
Theorem 4: If the spectral radius of the matrix 



A, 



(47) 



is strictly smaller than 1, LBP converges to a unique fixed 
point irrespective of the initial messages. 

Proof: Similar to the binary pairwise case; see Theorem 
[TOlin Appendix lAl for details. □ 
Note that Theorem [3] is a trivial consequence of Theorem 
|4] since the ^i-norm is an upper bound on the spectral radius. 
However, to prove the latter, it seems that we have to go 
through all the work (and some more) needed to prove the 
former. 

D. Special cases 

In this subsection we study the implications for two special 
cases, namely factor graphs that contain no cycles and the case 
of pairwise interactions. 

1 ) Trees: Theorem [4] gives us a proof of the well-known 
fact that LBP converges on trees (whereas Theorem [3] is not 
strong enough to prove that result): 

Corollary 5: If the factor graph is a tree, LBP converges to 
a unique fixed point irrespective of the initial messages. 

Proof: The spectral radius of ( f4Tb is easily shown to be 
zero in this special case, for any choice of the potentials. □ 

2) Pairwise interactions: We formulate Theorems [3] and 
2] for the special case of pairwise interactions (which corre- 
sponds to 7 taking on only one value), i.e. all factors consists 
of either one or two variables. For a pair-potential ififii = tp^p' 
expression d45l > simplifies to (see also Fig. 



sup sup tanh f - f log ffi ^'f | | . (48) 



a+a'0+0' \* \ ^p^J/ 

Note that this quantity is invariant to "reallocation" of single 
variable factors ip l or tp 1 to the pair factor ^ (i.e. N(i/j ij ) = 
N^^tp 1 ^)). 7V(^ IJ ) can be regarded as a measure of the 
strength of the potential ip 1,1 . 

The ^i-norm based condition d46l ) can be written in the 
pairwise case as: 

N(ip ij ) < 1. (49) 

jedi\k 

The matrix defined in d47l >. relevant for the spectral radius 
condition, can be replaced by the following \D\ x \D\ -matrix 
in the pairwise case: 



max max 
ieAf kedi z — ' 



At 



(k). 



(50) 
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For the binary case, we reobtain our earlier results, since 

N ( exp( Jij XiXj )) = tanh | | . 

E. Factors containing zeros 

Until now, we have assumed that all factors are strictly 
positive. In many interesting applications of the Sum-Product 
Algorithm, this assumption is violated: the factors may contain 
zeros. It is thus interesting to see if and how our results can 
be extended towards this more general case. 

The easiest way to extend the results is by assuming that — 
although the factors may contain zeros — the messages are 
guaranteed to remain strictly positive (i.e. the log-messages 
remain finite) after each update^ Even more general exten- 
sions with milder conditions may exist, but we believe that 
considerably more work would be required to overcome the 
technical problems that arise due to messages containing zeros. 

Assume that each factor ip 1 is a nonnegative function 
ip 1 : Yliei ~ * °°)- I n addition, assume that all factors 
involving only a single variable are strictly positive. This 
can be assumed without loss of generality, since the single 
variable factors that contain one or more zeros can simply 
be absorbed into multi-variable factors involving the same 
variable. Additionally, for each I E X consisting of more than 
one variable, assume that 



(51) 



These conditions guarantee that strictly positive messages 
remain strictly positive under the update equations ©, as one 
easily checks, implying that we can still use the logarithmic 
parameterization of the messages and that the derivative (|36| > 
is still well-defined. 

The expression for the potential strength d45l l can be written 
in a way that is also well-defined if the potential ip 1 contains 
zeros: 



sup sup sup 



^Lp^L'P'Y - yJ^L'p^Lp'-y (52) 

which is defined for i,j S I with i ^ j and where ip^Qi is 
shorthand for ijj 1 (xi — a,Xj = (3,Xi\uj\ = 7). 

The immediate generalization of Corollary [4] is then as 
follows: 

Theorem 5: Under the assumptions on the potentials de- 
scribed above (strict positivity of single variable factors and 
(IBTT l for the other factors): if the spectral radius of the matrix 

A^j^ = l NjV (J)l IV (j)N(^,i,j), (53) 

(with N(ip\ , defined in (l52l ) is strictly smaller than 1, 
LBP converges to a unique fixed point irrespective of the initial 
messages. 

Proof: Similar to the strictly positive case. The only 
slight subtlety occurs in Appendix [B] where one has to take 
a limit of strictly positive factors converging to the desired 

8 Additionally, the initial messages are required to be strictly positive, but 
this requirement is easily met and is necessary for obtaining good LBP results. 



nonnegative factor and use the continuity of the relevant 
expressions with respect to the factor entries to prove that 
the bound also holds in this limit. □ 
1 ) Example: Define, for e > 0, the ("ferromagnetic") pair 
factor -0(e) by the following matrix: 



1 e 
e 1 



Now consider a binary pairwise factor graph, consisting of a 
single loop of N binary variables, i.e. the network topology 
is that of a circle. Take for the N — 1 pair interactions 
^{1,1+1} (f or j _ 2, . . . , JV — 1) the identity matrices (i.e. 
the above pair factors for e = 0) and take for the remaining 
one tp^ 1,N ^ = V-'( e ) f° r some e > 0. Note that the potential 
strength N(ip(e)) = converges to 1 as e j 0. The spectral 
radius of the corresponding matrix A/_ > i ) j_ > j can be shown 
to be equal to 



P(A) 



1 



l/N 



which is strictly smaller than 1 if and only if e > 0. Hence LBP 
converges to a unique fixed point if e > 0. This result is sharp, 
since for e = 0, LBP simply "rotates" the messages around 
without changing them and hence no convergence occurs 
(except, obviously, if the initial messages already correspond 
to the fixed point of uniform messages). 



V. Comparison with other work 

In this section we explore the relations of our results with 
previously existing work. 



A. Comparison with work of Tatikonda and Jordan 

In [14], [15], a connection is made between two seemingly 
different topics, namely the Sum-Product Algorithm on the 
one hand and the theory of Gibbs measures [23] on the 
other hand. The main result of [14] states that LBP converges 
uniformly (to a unique fixed point) if the Gibbs measure on 
the corresponding computation treq3 is unique. 

This is a remarkable and beautiful result; however, the 
question of convergence of LBP is replaced by the question 
of uniqueness of the Gibbs measure, which is not trivial. 
Fortunately, sufficient conditions for the uniqueness of the 
Gibbs measure exist; the most well-known are Dobrushin's 
condition and a weaker (but easier verifiable) condition known 
as Simon's condition. In combination with the main result 
of [14], they yield directly testable sufficient conditions for 
convergence of LBP to a unique fixed point. For reference, 
we will state both results in our notation below. For details, 
see [14], [15] and [23]. Note that the results are valid for the 
case of positive factors consisting of at most two variables and 
that it is not obvious whether they can be generalized. 

9 The computation tree is an "unwrapping" of the factor graph with respect 
to the Sum-Product Algorithm; specifically, the computation tree starting at 
variable i £ Af consists of all paths starting at i that never backtrack. 



TECHNICAL REPORT, RADBOUD UNIVERSITY NIJMEGEN 



10 



1) LBP convergence via Dobrushin's condition: Define 
Dobrushin 's interdependence matrix as the N x N matrix C 
with entries 

dj := sup sup \P(xi \ x ei \j,Xj) - P{x t \ x di \ j ,x' j )\ 

XOi\j x^x'j * x . 

(54) 

for j 6 di and otherwise. 

Theorem 6: For pairwise (positive) factors, LBP converges 
to a unique fixed point if 



max 



Proof: For a proof sketch, see [15]. For the proof of 
Dobrushin's condition see chapter 8 in [23]. □ 
We can rewrite the conditional probabilities in terms of 
factors: 

^(x^ixij) Ukedi\j ^ k (x t k) 



P(Xi \x 8i \j,Xj) 



Ex, ^{ x iW j { x ij) Ukedi\j ^ ik ( x ik 



Note that the complexity of the calculation of this quantity 
is generally exponential in the size of the neighborhood 
dj, which may prohibit practical application of Dobrushin's 
condition. 

For the case of binary ±1 -valued variables, some elementary 
algebraic manipulations yield 



Cij = sup 



sinh 2 I J;. 



x d i\j cosh2Jjj 



cosh2(0i + J2kedi\j x kJik) 
tanli(| Jij | - Hij) + tanh(| Jy| + Hij) 



with 



Hij := inf 

x Bi\j 



^ XkJi 
kedi\j 



tk 



2) LBP convergence via Simon's condition: Simon's con- 
dition is a sufficient condition for Dobrushin's condition (see 
proposition 8.8 in [23]). This leads to a looser, but more easily 
verifiable, bound: 

Theorem 7: For pairwise (positive) factors, LBP converges 
to a unique fixed point if 



max 



^ (i . 

> -sup sup log— — 



< 1. 



□ 

It is not difficult to show that this bound is weaker than d49l ). 
Furthermore, unlike Dobrushin's condition and Corollary [4] it 
does not take into account single variable factors. 

B. Comparison with work of Ihler et al. 

In the recent and independent work [16] of Ihler et al., a 
methodology was used which is very similar to the one used in 
this work. In particular, the same local quotient metric is 
used to derive sufficient conditions for LBP to be a contraction. 
In the work presented here, the Mean Value Theorem (in the 
form of Lemma [TJ is used in combination with a bound on the 
derivative in order to obtain a bound on the convergence rate 
K in ([8]l. In contrast, in [16] a direct bound on the distance of 



two outgoing messages is derived in terms of the distance of 
two different products of incoming messages (equation (13) in 
[16]). This bound becomes relatively stronger as the distance 
of the products of incoming messages increases. This has the 
advantage that it can lead to stronger conclusions about the 
effect of finite message perturbations than would be possible 
with our bound, based on the Mean Value Theorem. However, 
for the question of convergence, the relevant limit turns out to 
be that of infinitesimal message perturbations, i.e. it suffices 
to study the derivative of the LBP updates as we have done 
here. 

In the limit of infinitesimal message perturbations, the 
fundamental bound (13) in [16] leads to the following measure 
of potential strength: 



-D(V> y ) := tanh - sup sup log 

\ 2 \a.p a'. ,13' 



1l) %3 



Using this measure, Ihler et. al derive two different conditions 
for convergence of LBP. The first one is similar to our d49| i and 
the second condition is equivalent to our spectral radius result 
(1501 1. except that in both conditions, N{\j} 1 ^) is used instead of 
D{ip^). The latter condition is formulated in [16] in terms of 
the convergence properties of an iterative BP-like algorithm. 
The equivalence of this formulation with a formulation in 
terms of the spectral radius of a matrix can be seen from 
the fact that for any square matrix A, p(A) < 1 if and only if 
lim JWOO A n = 0. However, our result also gives a contraction 
rate, unlike the iterative formulation in [16]. 

Thus, the results in [16] are similar to ours in the pairwise 
case, except for the occurrence of D(ip' 1 ^) instead of N(ip 1 ^). 
It is not difficult to see that N(ip' 1 ^) < D(ip' 1 ^) for any pair 
factor tpv; indeed, for any choice of a, (3, 7, 8: 

V^V^V / V^W<U ^ ( VVr ) / ( inf ) • 

Thus the convergence results in [16] are similar to, but weaker 
than the results derived in the present work. 

After initial submission of this work, [17] was published, 
which improves upon [16] by exploiting the freedom of choice 
of the single node factors (which can be "absorbed" to an 
arbitrary amount by corresponding pair factors). This leads to 
an improved measure of potential strength, which turns out 
to be identical to our measure N^^). Thus, for pairwise, 
strictly positive potentials, the results in [17] are equivalent 
to the results d49l i and d50l presented here. Our most general 
results, Theorems [3] |4] and [5] and Corollary |4] are not present 
in [17]. 

C. Comparison with work of Heskes 

A completely different methodology to obtain sufficient 
conditions for the uniqueness of the LBP fixed point is used in 
[18]. By studying the Bethe free energy and exploiting the re- 
lationship between properties of the Bethe free energy and the 
LBP algorithm, conclusions are drawn about the uniqueness 
of the LBP fixed point; however, whether uniqueness of the 
fixed point also implies convergence of LBP seems to be an 
open question. We state the main result of [18] in our notation 
below. 
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The following measure of potential strength is used in [18]. 
For J6l, let 

ui := sup sup (log ip 1 (xi) + (\I\ - ^logip 1 (x'j) 
iei 

The potential strength is then defined as 07 := 1 — e~ Ul . 

Theorem 8: LBP has a unique fixed point if there exists an 
"allocation matrix" Xn between factors I E I and variables 
% e N such that 

1) Xn > for all I el,i€ I; 

2) (1 - 07) max ie / X u + 07 Y^iei X n - 1 for a11 

3) Ei eNi X H >Wi\-l for all i e TV. 

Proof: See Theorem 8.1 in [18]. □ 
The (non)existence of such a matrix can be determined using 
standard linear programming techniques. 

VI. Numerical comparison of various bounds 

In this subsection, we compare various bounds on binary 
pairwise graphical models, defined in (0, for various choices 
of the parameters. First we study the case of a completely 
uniform model (i.e. full connectivity, uniform couplings and 
uniform local fields). Then we study non-uniform couplings 
Jij, in the absence of local fields. Finally, we take fully random 
models in various parameter regimes (weak/strong local fields, 
strong/weak ferromagnetic/spin-glass/anti-ferromagnetic cou- 
plings). 

A. Uniform couplings, uniform local field 

The fully connected Ising model consisting of N binary 
±l-valued variables with uniform couplings J and uniform 
local field 9 is special in the sense that an exact descrip- 
tion of the parameter region for which the Gibbs mea- 
sure on the computation tree is unique, is available. Using 
the results of Tatikonda and Jordan, this yields a strong 
bound on the parameter region for which LBP converges 
to a unique fixed point. Indeed, the corresponding com- 
putation tree is a uniform Ising model on a Cayley tree 
of degree N — 2, for which (semi-)analytical expressions 
for the paramagnetic-ferromagnetic and paramagnetic-anti- 
ferromagnetic phase transition boundaries are known (see 
section 12.2 in [23]). Since the Gibbs measure is known 
to be unique in the paramagnetic phase, this gives an exact 
description of the (J, 9) region for which the Gibbs measure 
on the computation tree is unique, and hence a bound on LBP 
convergence on the original model. 

In Fig. E] we have plotted various bounds on LBP con- 
vergence in the (J, 9) plane for N = 4 (other values of N 
yield qualitatively similar results). The gray area (g) marks 
regions where the Gibbs measure on the computation tree is 
not unique; in the white area, the Gibbs measure is unique and 
hence LBP is guaranteed to converge. Note that this bound is 
only available due to the high symmetry of the model. In [24] 
it is shown that parallel LBP does not converge in the lower 
(anti-ferromagnetic) gray region. In the upper (ferromagnetic) 





Fig. 4. Comparison of various LBP convergence bounds for the fully 
connected N = 4 binary Ising model with uniform coupling J and uniform 
local field 8. (a) Heskes' condition (b) Simon's condition (c) spectral radius 
condition (d) Dobrushin's condition (e) improved spectral radius condition 
for m = 1 (f) improved spectral radius condition for m = 5 (g) uniqueness 
of Gibbs' measure condition. See the main text (section IVI-At for more 
explanation. 



region on the other hand, parallel LBP does converge, but it 
may be that the fixed point is no longer unique. 

The various lines correspond to different sufficient con- 
ditions for LBP convergence; the regions enclosed by two 
lines of the same type (i.e. the inner regions for which J is 
small) mark the regions of guaranteed convergence. The lightly 
dotted lines (a) correspond with Heskes' condition, Theorem[8] 
The dash-dotted lines (b) correspond with Simon's condition, 
Theorem [7] The dashed lines (d) correspond with Dobrushin's 
condition (Theorem[6]l, which is seen to improve upon Simon's 
condition for 9^0, but is nowhere sharp. The solid lines 
(c) correspond with the spectral radius condition Corollary [3] 
(which coincides with the ^-norm based condition Corollary 
|2] in this case and is also equivalent to the result of [16]), 
which is independent of 9 but is actually sharp for 9 = 0. 
The heavily dotted lines (e) correspond to Corollary [4] with 
m = 1, the +-shaped lines (f) to the same Corollary with 
to = 5. Both (e) and (f) are seen to coincide with (c) for 
small 9, but improve for large 9. 

We conclude that the presence of local fields makes it 
more difficult to obtain sharp bounds on LBP convergence; 
only Dobrushin's condition (Theorem |6]l and Corollary [4] take 
into account local fields. Furthermore, in this case, our result 
Corollary |4] is stronger than the other bounds. Note that the 
calculation of Dobrushin's condition is exponential in the 
number of variables N, whereas the time complexity of our 
bound is polynomial in N. Similar results are obtained for 
higher values of N. 

B. Non-uniform couplings, zero local fields 

We have investigated in more detail the influence of the 
distribution of the couplings Jy, in the absence of local 
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Fig. 5. Comparison of various bounds for LBP convergence for toroidal Ising 
model of size 10 X 10 with normally distributed couplings J;j ~ Af(Jo, <tj) 
and zero local fields, (a) Heskes' condition (b) Dobrushin's condition (c) Mi- 
nora! condition (d) spectral radius condition (e) empirical convergence. See 
the main text (section IVI-B) for more explanation. 

fields, and have also compared with the empirical conver- 
gence behavior of LBP. We have taken a binary Ising model 
on a rectangular toroidal grid (i.e. with periodic boundary 
conditions) of size 10 x 10. The couplings were random 
independent normally distributed nearest-neighbor couplings 
Jij ~ Af(Jo,aj), the local fields were f?$ = 0. Let (rj,<j)j) 
be the polar coordinates corresponding to the Cartesian co- 
ordinates (Jo,ctj). For various angles <f>j <E [0, 7r], we have 
determined the critical radius r j for each bound. The results 
have been averaged over 40 instances of the model and can 
be found in Fig. [5] The lines correspond to the mean bounds, 
the gray areas are "error bars" of one standard deviation. The 
inner area (for which the couplings are small) bounded by 
each line means "convergence", either guaranteed or empirical 
(thus the larger the enclosed area, the tighter the bound). From 
bottom to top: the thin solid line (a) corresponds with Heskes' 
result (Theorem [8), the dash-dotted line (b) with Dobrushin's 
condition (Theorem|6]l, the dotted line (c) corresponds with the 
^i-norm based condition Corollary [2] the dashed line (d) with 
the spectral radius condition Corollary [3] and the thick solid 
line (e) with the empirical convergence behavior of LBP. 

We conclude from Fig. [5] that the spectral radius condition 
improves upon the £i-norm based condition for non-uniform 
couplings and that the improvement can be quite substantial. 
For uniform couplings (and zero local fields), both conditions 
coincide and it can be proved that they are sharp [1]. 

C. Fully random models 

Finally, we have considered fully connected binary pairwise 
graphical models with completely random couplings and lo- 
cal fields (in various parameter regimes). We drew random 
couplings and local fields as follows: first, we drew i.i.d. 
random parameters J ,crj,8o, cre from a normal distribution 
with mean and variance 1. Then, for each variable i we 
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TABLE I 

Comparison of bounds (50000 trials, for N = 4 and N = 8) 



N = 4 


mm 


Cor.fD 


Th.[8] 


Cor. [4] 


Th.© [15] 




(5779) 


170 


3564 





Cor. [3] [16] 




10849 


(16458) 


13905 





Th.[8] [18] 




338 





(2553) 





Cor. m = 


1, this work 


13820 


3141 


17046 


(19599) 


N = 8 




mm 


Cor. |3] 


Th.|S] 


Cor. |4] 


Th.|H [15] 




(668) 


39 


597 





Cor. [3] [16] 




507 


(1136) 


1065 





Th.[8] [18] 










(71) 





Cor. \4\ m = 


1, this work 


972 


504 


1569 


(1640) 



independently drew a local field parameter 8i ~ J\f(9o,ag), 
and for each pair {i,j} we independently drew a coupling 
parameter ~ J\f( J , aj). 

For the resulting graphical model, we have verified whether 
various sufficient conditions for LBP convergence hold. If 
condition A holds whereas condition B does not hold, we say 
that A wins from B. We have counted for each ordered pair 
(A, B) of conditions how often A wins from B. The results 
(for 50000 random models consisting of N = 4, 8 variables) 
can be found in Table U the number at row A, column B is 
the number of trials for which bound A wins from bound B. 
On the diagonal (A = B) is the total number of trials for 
which bound A predicts convergence. Theorem [6] is due to 
[15], Corollary [3] was first published (for the binary case) in 
[16] and Theorem [8] is due to [18]. 

Our result Corollary |4] (for m = 1) outperforms the other 
bounds in each trial. For other values of 7Y, we obtain similar 
results. 

VII. Discussion 

In this paper we have derived sufficient conditions for 
convergence of LBP to a unique fixed point. Our conditions are 
directly applicable to arbitrary graphical models with discrete 
variables and nonnegative factors. This is in contrast with 
the sufficient conditions of Tatikonda and Jordan and with 
the results of Ihler, Fisher and Willsky, which were only 
formulated for pairwise, positive factors. We have shown cases 
where our results are stronger than previously known sufficient 
conditions. 

Our numerical experiments lead us to conjecture that Corol- 
lary |4]is stronger than the other bounds. We have no proof for 
this conjecture at the moment, apart from the obvious fact that 
Corollary |3]is weaker than Corollary @] To prove that Corollary 
|4]is stronger than Theorem [6] seems subtle, since it is generally 
not the case that p(A) < ||C|| , although it seems that the 
weaker relation ||C|| < 1 ==>• p(A) < 1 does hold in 
general. The relation with the condition in Theorem [8] is not 
evident as well. 

In the binary pairwise case, it turned out to be possible 
to derive sufficient conditions that take into account local 
evidence (Corollary @}. In the general case, such an im- 
provement is possible in principle but seems to be more 
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involved. The resulting optimization problem (essentially ( |43] > 
with additional assumptions on h) looks difficult in general. 
If the variables' cardinalities and connectivies are small, the 
resulting optimization problem can be solved, but writing 
down a general solution does not appear to be trivial. The 
question of finding an efficient solution in the general case is 
left for future investigation. 

The work reported here raises new questions, some of 
which have been (partially) answered elsewhere after the initial 
submission of this paper. The influence of damping the LBP 
update equations has been considered for the binary pairwise 
case in [25], where it was shown that damping has the most 
effect for anti-ferromagnetic interactions. Furthermore, it has 
been proved in [25] that the bounds for LBP convergence 
derived in the present work are sharp in the case of binary 
variables with (anti-)ferromagnetic pairwise interactions and 
zero local fields, as suggested by Fig. [5] An extension of the 
results towards sequential update schemes has been given in 
[26]; it is shown that for each reasonable sequential update 
scheme, the same conditions for convergence to a unique fixed 
point as derived in this work apply. Likewise, in [24] it is 
shown that Dobrushin's condition is also valid for sequential 
LBP. 

Appendix A 
Generalizing the ^-norm 

Let (Vi, \\-\\a) be a collection of normed vector spaces and 
let V = Vi be the direct sum of the Vi's. The function 
|| -|| : V -> K defined by 



Ml :=£HI< 



(A.55) 



Hence: 



\Av\\ = 



£ 

k 



< max 



\\v k \\ k Av k 



\Aii\W 



< 



£ 



\Vk\ 



\Avk\\ 



It is evident that this value is also achieved for some v E V 
with ||u|| =1. □ 

An illustrative example is obtained by considering V = M. N 
to be the direct sum of N copies of M with the absolute value 
as norm; then the norm dA.55b on R N is simply the ^i-norm 
and the induced matrix norm dA.561 > reduces to ([9JI. 

Suppose that each Vi has a linear subspace Wj. We can 
consider the quotient spaces Vj/Wj with quotient norms ||-|L . 
The direct sum W :— W« i s itself a subspace of V, 
yielding a quotient space V/W. For v E V we have v = J^i W 
and hence V/W = 0j(K/Wi). The quotient norm on V/W 
is simply the sum of the quotient norms on the Vi/Wf. 



inf 



V + W 



inf y 



\Vi + Wi 



> inf 



Wi 



E 



(A.57) 



Let A : V — > V be a linear mapping such that AW C W. 
Then A induces a linear A : V/W — > V/W; since AijWj C 



Wi, each block Aij : Vj 



Vi induces a linear Aij : Vj/Wj 



Vi/Wi, and A can be regarded as consisting of the blocks Aij. 

Corollary 6: The matrix norm of A : V/W — > V/W 
induced by the quotient norm || 7 || on V/W is: 



A = 



Ai 



(A.58) 



is a norm on V, as one easily checks. Let A : V — ► V be a 
linear mapping with "blocks" A^ : Vj — * Vi defined by 

VvjEVj: Ar, £.l,,r ; . AijVj € Vi 
for all j. 

Theorem 9: The matrix norm of A induced by the vector 
norm || -|| is given by: 



where 



\A\\ = max V" || Aj 



\Aij\\l := sup \\AijxWt 



(A.56) 



Proof: Let v k E Vk such that ||wfc|| fc = 1. Then 



\Av k 



E^> 



kVk 



E ii^ 



<E \\ Aik \\i < max E ii A 



3 

l 3 Hi 



Now let v E V such that ||i>|| =1. Then v can be written as 
the convex combination v = J^k \\ v k\\k ^k> where 



where 



A, 



(A.59) 



\\Aij\\ = sup 

xeVj, 

llxll.-O 

Proof: We can directly apply the previous Theorem to 
the quotient spaces to obtain ( IA.58b : because 



{x E Vj/Wj 
we have: 

II II ■ : = 



< 1} = {x E Vj 



<1}, 



sup 

<1 



A, 



sup 

xEVj 

INI 3 <i 



Ai 



□ 

For a linear A : V — > V such that AW C W, we define the 



matrix \A\ { j with entries \A\^ 
such linear mappings; then 



Ai 



Let A, B be two 



\ABV= (AB)i 



J2 A ^ B - 



— E I A ikBkj j < £ \\Aik 

k 1 k 

= £14*1*1** 



\i \\ B kj\\l 



Vk :- 



TOT ifv ^° 
if v k = 0. 



hence |AB| < \A\\B\. Note that 111^1^ = ||A|| . We can 
generalize Theorem |2 
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Theorem 10: Let / : V — > V be differentiable and suppose 
that it satisfies ( f30l >. Suppose further that |/'(i>)| < A for some 
matrix Ay (which does not depend on v) with p(A) < 1. Then 
for any v £ V/W, the sequence v, f(v), f («), . . . obtained 
by iterating / converges to a unique fixed point . 

Proof: Using the chain rule, we have for any n = 
1,2,... and any v E V: 



(f n Y(v) 



< 



Uf'if-'iv)) 

i=l 
n 



< 



tlf'ip-Kv)) 

z=l 

fllf'if-'iv)) 



i=l 



A n \ 



By the Gelfand Spectral Radius Theorem, \\A n \\ x 1/r 



P(A) 



for n — > oo. Choose e > such that + e < 1. For some 



iV, 



l^ld < (p(A) + e)^ < 1. Hence 



< 1 for 



all v £ V/W. By Lemma 12 / is a contraction with respect 
to the quotient norm on V/W. Now apply Lemma [3] □ 

Appendix B 
Proof that (T43l> equals (l44b 



Let ipp be a matrix of positive numbers. Let 



H:={h: h^>0,J2^ = l}- 

0,1 



Define the function g :7f — 
Theorem 11: 



by 



01 



,T,p^i^0i h 0-t 



- i 



sup g(h) = 2 sup suptanh [ i log - ^ 7 ] . 

Proof: First note that we can assume without loss of 
generality that all ^ are different, because of continuity. 
Define 

■= inf -0fl 7 , := sup iig , 

X X 1 :=X\{^^:f3^}. 

For $el, define 

:={h€H:J2$0i h 0i = iSf h 

0,i 



which is evidently a closed convex set. The function 

: — » K : ft i— > 



obtained by restricting g to is convex. Hence it achieves 
its maximum on an extremal point of its domain. 
Define 

H 2 := {heH: #{(/?, 7 ) : V, > 0} = 2} 



as those ft € 7i with exactly two non-zero components. For 
ft € K2, define ip_(h) := mf{iig : ft^ 7 ^ 0} and V> + (ft) := 
sup{V^ 7 : ft^ 7^ 0}. Because of continuity, we can restrict 
ourselves to the ^ 6 X', in which case the extremal points 
of are precisely = H H2 (i.e. the extremal points 
have exactly two non-zero components). 
Now 



sup g(h) 
hen 



sup sup g-q, (ft) 
sup sup 3* (ft) 



= sup 
hen- 



sup 

$_(fc)<*<1? + (ft) 

sup 5 (ft). 

ftew 2 



3* (ft) 



For those ft G 7^2 with components with different /3, we 
can use the Lemma below. The ft S 7^2 with components 
with equal /3 are suboptimal, since the two contributions in 
the sum over 7 in g(h) have opposite sign. Hence 



sup g(h) 

heH 2 



2 sup sup tanh 

0+0' 1,1' 



□ 



Lemma 4: Let < a < b. Then 

-1 4 



'/1 



771 a + 7726 



771a + rj 2 b 



sup 

^e(o,i) 2 

r)l+r)2 = l 

= 2 tanh -log- = 2-= — 

V4 a/ V6 + V^ 
Proof: Elementary. The easiest way to see this is to 

reparameterize 77 = (5^77, slab) with " e (-°°> °°)- D 
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